isdmag.com Articles

Editorial
Today's News
News Archives
On-line Articles
Current Issue
Magazine Archives
Subscribe to ISD

Directories:
Vendor Guide 2001
Advertiser Index
Event Calendar

Resources:
Resources and Seminars
Special Sections

Information:
2001 Media Kit
About isdmag.com
Writers Wanted!
Search isdmag.com
Contact Us

Choosing The Right Embedded Software Development Tools

Rapid growth in the embedded systems market has complicated the purchasing process.

By Tony Barbagallo

Abstract In response to the rapid growth of the embedded systems market, a profusion of software development tools have appeared on the scene. This article discusses several product features and capabilities that should be evaluated before making a choice in development tools. This article also describes what to look for in a tools supplier that will help you to determine whether your supplier is positioned to meet both your immediate project needs and your future requirements.
Getting Today's Job Done To complete any embedded systems project, you need to be able to perform three basic steps:

Build your application.
Load your application into target memory.
Debug and test your application.
Building an application is more complex for an embedded system than for a standard computer program. An embedded application will do a significant amount of direct interfacing with the hardware through device drivers or interrupt service routines, which requires the ability to conveniently manipulate control registers or perform other low-level operations.
Another complication with embedded software applications is that program memory and processor resources are usually constrained, mandating that the program must make use of resources as efficiently as possible. Furthermore, embedded systems rarely have disks, and as a result, the application is stored in ROM or in Flash RAM. Since the application requires RAM, it is common for an embedded system to have a noncontiguous memory layout. This requires that the build environment provide great flexibility in separating the application into discrete pieces.
In addition to fine-tuning the performance and size of your application, you also need to get it working correctly. This requires the ability to debug and test your code on your target hardware. Apart from obvious needs, such as the ability to do source-level debug, just establishing a debugging connection to the target can be difficult if target I/O resources are limited. You may also be faced with resolving difficult timing-related problems.
With these issues in mind, we'll take a closer look at the following key areas of development technology to determine exactly what features you need to successfully complete your current application:

Developing code that directly interfaces with the hardware
Optimizing your application code
ROMing your application
Debugging your application at the source level
Debugging your application in an embedded target
Developing Code That Directly Interfaces With the Hardware Due to the size of today's embedded software applications, a high-level language such as C or C++ is required to increase programmer productivity. However, some compilers are not well designed for embedded systems and do not adequately support the writing of high-level language programs that directly control the hardware.
To avoid forcing software designers to write large amounts of assembly language and to abandon the productivity gains of a high-level language, the compiler must support packed bitmaps, in-line assembly code, and writing interrupt handlers in C/C++.
Packed bitmaps allow a programmer to directly map memory-mapped hardware control registers into C data structures (see Figure 1). The compiler must be capable of packing data structures because default data structure layout on 32-bit architectures aligns individual variables on 32-bit boundaries to minimize access times. This default alignment is achieved by inserting padding bits if the structure elements are less than 32 bits. Since the bitmaps that map on control registers are frequently not exactly 32 bits, inserting padding bits may make it impossible to represent the control registers in a C data structure. Packing eliminates this problem by allowing structures to be laid out without padding bytes.

Figure 1. Packing allows a programmer to represent memory-mapped registers as C bitfields.

When using integrated microcontrollers, you may have no choice but to use special assembly language instructions to perform hardware-specific functions. To simplify switching to assembly language, the compiler should support easy in-line insertion of a few assembly language instructions into a C function. The in-line assembly code should be able to refer to C variables symbolically and return a value of an appropriate C type. This makes it easy to write most of the routine in C or C++ and only switch to assembler when absolutely necessary.
The need to write interrupt handlers is commonly found in embedded systems. While it is desirable to implement the interrupt handlers as C functions, the stack must be set up differently for an interrupt handler compared to a standard function, and all scratch registers must be saved and restored. To achieve this, the compiler should offer an extension to allow the user to specify that a particular function will serve as an ISR. This enables the compiler to automatically set up the function prologue/epilogue correctly and eliminates the need to put assembly language wrappers around the function to handle the interrupt stack.
Optimizing Your Application Code Producing fast, compact code is an important attribute of any compiler. It is especially important in embedded systems because of the need to keep production costs down by using low-cost processors and minimizing memory costs. It is also important to thoroughly understand all of the different factors that can affect the quality of generated code. The best way to check code quality is to run benchmarks using your own application code. However, if you don't have time to run a comprehensive set of benchmarks, there are a number of points worth checking.
One point to check is whether a compiler offers the option to optimize for size as opposed to speed. This is important for embedded systems where code size is a major concern.
Requesting a comprehensive list of the generic optimization techniques used is helpful. In addition to generic optimizations, it is also important that a compiler be tuned to take advantage of a particular architecture and variants within that architecture. For example, CISC processors tend to benefit from a different approach to code generation than RISC processors. A compiler that takes full advantage of the more powerful instructions and addressing modes provided by a CISC instruction set can significantly increase code density, thereby reducing memory costs.
Even within a given architecture, variant-specific optimization is important. Most architectures preserve backward compatibility with a "lowest common dominator" subset of the functionality offered by the latest generation processors. If a compiler only targets this subset, significant performance or functionality can be lost. Good examples of this are provided by both the i960 and 68000 processor families. If a compiler treats all i960 processors the same, it cannot take advantage of the superscalar execution offered by the 960Cx and Hx variants because the low-end Kx variants are not superscalar. In this instance, applications running on a 960CA would perform about 5 percent slower. Functionality can also be lost. If a compiler treats a CPU32 processor (such as a 68360) like a vanilla 68000, an application will be restricted to 24-bit addressing rather than the 32-bit addressing allowed by the CPU32.
In addition to traditional compiler optimizations, there are a number of factors that can affect your application's program size and performance. One area that offers significant scope for improved performance and reduced code size is the run-time library. It is important that the source code is provided for the run-time library, allowing you to strip out unneeded functionality in individual modules and compile the library using the same combinations as the application, tailoring it for size or speed as appropriate.
If your application is going to use floating-point emulation libraries, the performance of these libraries may be very important in determining overall application performance.
The importance of compiler support for packing data structures was discussed earlier in the context of accurately mapping memory-mapped registers onto C data structures. Packing is also very useful if you want to minimize the memory consumed by your application, as it removes padding bits from the structure. It is especially helpful if the compiler allows you to pack structures on a case-by-case basis. You can then leave smaller, frequently accessed structures unpacked to maximize performance, but pack other structures to save space.
ROM-ing Your Application As noted earlier, embedded systems store an application in ROM and may have small noncontiguous blocks of memory. This contrasts with a native computer system, where an application program is stored on a disk, and then loaded into a contiguous area of RAM to execute. Unless a compiler is designed specifically for embedded applications, a compiler and linker usually pay little attention to the need to precisely segment an application's code and data into many discrete parts. This is often a necessity when the application is intended to reside in a custom embedded system.
How many sections the compiler lets you generate is important. For complete flexibility, the compiler should let you generate an unlimited number of sections, including user-defined ones (see Figure 2). A good variety of standard sections should be provided, such as code, constants, strings, heap, stack, initialized, and uninitialized data. The compiler should not lump code and constants together in a single unwieldy section.

Figure 2. Embedded compilers must support precise segmentation of code and date into multiple sections.
Initialized data presents a particular challenge in a ROM-based system, because while the variables must reside in RAM, the values used to initialize the variables must be kept somewhere in ROM and copied over at system startup. A compiler and linker optimized for embedded systems should automate this process, including generating the appropriate start-up code.
While providing flexible segregation of code and data into different sections is a prerequisite for building an application that can reside in an embedded system, even more flexibility is often required. You may wish to separate the boot module from the rest of your application and place it in a separate ROM. This requires that the linker support the ability to split up sections so you can locate the boot modules in a different area of memory from the rest of your application code (see Figure 3).

Figure 3. An embedded linker should allow you to split up each section and locate individual models into discrete memory locations.
Debugging Your Application at the Source Level Once you have built your application for your target system, the next task is to verify that the application works, and remove any defects that are detected in this process. Since it's likely your application will be mostly written in C or C++ with a small amount of assembly language, it is very important that your debugger can handle both C/C++ and assembly language debugging.
A debugger should support three different modes:

C/C++ source level only
Mixed C/C++ and assembly language
Assembly language source level only
Mixed C/C++ and assembly language displays are very useful when debugging C/C++ code that includes in-line assembly code or calls to assembly language routines; otherwise you must laboriously correlate between the two. It is also needed when you have to track software-hardware interaction problems that initially manifest themselves as crashes in a C program. Such problems can only be detected at a very low level.
When looking at assembly language displays, another point to check is how much symbolic information is preserved from the original language source code. Being able to see local symbols, line numbers, labels, and unexpanded macros, significantly simplifies debugging. Since many debuggers use disassembly to generate assembly language displays, this information may not be available.
Before going into a detailed examination of all a debugger's features, you first need to determine how intuitive and easy to use the debugger is. Ease of use is to some extent a subjective issue. However, if you find that the most frequently used debugger commands require pulling down multiple menus or making multiple mouse clicks, it is likely the most powerful features will simply be too difficult to use. If common commands such as breakpoints or displaying memory are straightforward to use, you can proceed to take a closer look at the source-level debugging features to get a better picture of the debugger's underlying capabilities.
The debugger's ability to display complex data structures is important to investigate. The debugger should be able to display C structs or C++ classes in a well laid-out, easy-to-read manner. Enumerated types should be displayed using the user-defined value name rather than the representation generated by the compiler (for example, the compiler might translate red and blue into 0 and 1). Another feature to look for is whether the debugger automates traversing pointer-based dynamic data structures. This examines structures such as queues, linked lists, or trees, which are commonly used in C/C++ programs.
Another useful point to check is whether the debugger can display local variables, which are stack- or register-based. Preferably, you should be able to display a stack traceback, and easily expand each stack frame to look at any local variables of interest.
The final issue to verify is the level of functionality provided by the debugger when compiler optimization is enabled. If you cannot debug optimized code, you have a problem because your debugging and testing efforts are no longer focused on the final application code. Since embedded systems have to be highly reliable, this is not a desirable situation.
Debugging Your Application in an Embedded Target To debug your software in an embedded target, you will need to connect to your target. Several different technologies are available to help debug target-resident software. In this section, the different merits of each embedded debugging technique will be discussed to help you determine the best mix of tools for debugging your application.
The most cost-effective solution for a team of engineers is likely to be a mix of tools. Therefore, it is highly desirable that any source-level debugger you purchase works with a variety of different cross-debugging tools. Otherwise, you may end up having to learn multiple debuggers.
Five different ways of debugging software in an embedded systems will be discussed:

Software monitor
Background debug mode connection
ROM emulator
In-circuit emulators
Logic analyzers
Because microprocessors are including an increasing amount of debugging capability on-chip, the distinction between the capabilities offered by each of these technologies is becoming blurred. Therefore, it's important not to make many assumptions about a particular product or technology. However, the sections below give you a general overview of the relative benefits of each technology.
Software Monitor The lowest cost technology to connect to an embedded target is a software monitor (often referred to as a ROM monitor). Software monitors require some target memory resources in which the monitor and its drivers reside. They also require the presence of an I/O port, such as ethernet or serial port, to communicate between the host computer, where the source-level debugger resides, and the embedded target. Since the software monitor runs on the target processor, it must also steal CPU cycles to process debugger commands. A final requirement is that any application code in which you wish to set breakpoints must reside in RAM, because the monitor has to patch software interrupts into the code to stop the application and gain control of the processor.
Software monitors provide a good solution for non-real-time cross-debugging. You can download code or commands, and start and stop target execution. This enables you to perform many basic debugger functions such as setting breakpoints, single-stepping, or examining memory.
A very important point to check when purchasing a software monitor is the driver and board support. Before you can use a monitor-based debugging solution, you must first bring up the monitor on your board. If you are using an off- the-shelf VME, embedded PC, or evaluation board, the vendor should be able to provide a board support package that allows you to immediately bring up the monitor and connect to a host computer. If you have a custom board, it is helpful if the vendor can provide a working sample device driver for the communication device on your board, and some sample initialization code. Otherwise, it can take you time to bring up the initial connection.
A number of processors, for example the 386 and i960, support debug registers that a software monitor can use. These registers enable a monitor to set breakpoints on reads and writes to variables or on code in ROM--features usually associated with an emulator.
Background Debug Mode Background Debug Mode (BDM) is an on-chip debug monitor provided by members of Motorola's CPU32 and CPU32+ families (68340, 68360, and 68332). Comparable debugging capabilities are being introduced on the PowerPC and ColdFire processor families as well.
BDM provides debugging functionality similar to a software monitor, such as execution control, and reading and writing target memory. However, it has several advantages over a software monitor approach. Since the BDM is on-chip, no target memory resources are required. Another advantage is that there is no requirement to use a target I/O port, which also eliminates the need for a board support package. This allows you to instantly connect to your target board provided the 4-pin BDM connection has been exposed.
BDM solutions come in several forms. The least expensive to use is a simple cable that connects to a PC parallel port. This cable typically costs around $300. Several cross-debugging vendors provide a BDM configuration based on this. However, In-Circuit Emulator (ICE) vendors also have taken advantage of BDM to provide relatively low-cost products that support additional functionality such as ethernet connections. These products are discussed later.
ROM Emulators ROM emulators provide a relatively low cost (up to about $5,000) way to connect to custom embedded boards. Since ROM emulators do not require a target I/O port to connect to a board, they are very useful if you are designing a device such as a cellular phone that does not have serial or ethernet ports you can use as debug connections. Even if a serial port is available, a ROM emulator can significantly improve programming productivity by providing faster downloads through an ethernet connection.
The major attraction of a ROM emulator over even a low-cost ICE-like product is the ease with which it can be retargeted to new systems. Since there are no processor dependencies, a ROM emulator can work with almost any system, provided it is capable of emulating the ROM. Supporting a different type of ROM socket typically costs very little (up to about $750 unless higher-speed ROMs need to be supported).
ROM emulators connect to the target by plugging in to a ROM socket on the target board. Connecting the ROM emulator is a fairly straightforward process, although more complex than an ICE because you have to configure a software monitor to work with the ROM emulator. Unlike an ICE, the target must have a functioning memory system.
ROM emulators contain emulation memory that eliminates the need to blow ROMs whenever you have to download a new version of your application. This memory is only suitable for emulating the target ROMs and cannot emulate the target RAM like an ICE can.
As with software monitors or basic BDM debuggers, ROM emulators provide execution control and the ability to read and write memory, but they do not provide real-time debugging capabilities.
In-Circuit Emulators ICEs are typically the most powerful and effective tools for debugging embedded applications, offering a superset of the functionality provided by the other tools discussed here. ICEs allow both basic debug monitor functions, such as execution and reading/writing memory, and real-time debugging. As one might expect, this high level of functionality comes at a price, with most true ICEs (for 32-bit processors) costing between $10,000 and $25,000.
For cost reasons, it is not usually practical to equip every member of a large team with an ICE. However, it is a good investment to make sure that your most experienced embedded developers have access to ICEs for resolving hardware/software integration problems.
Like a ROM emulator, an ICE does not require any target I/O resources. Most ICEs can also offer fast downloads through an ethernet link. Although there can sometimes be electrical or mechanical problems that make it difficult to use an ICE, connecting to your target is straightforward. Since the ICE emulates the processor in your system, you simply remove your processor and plug in the ICE to establish a connection. There is no need to build a board support package.
ICEs are especially useful in two phases of the project: bringing up low-level routines (such as drivers and ISRs) and real-time integration.
Typically initialization routines, such as drivers and ISRs, are the first codes downloaded onto the board. A major advantage of an ICE is that it permits you to download and debug code even if the board is only partially functional. This allows you to begin low-level software development earlier as you do not need to wait for the final version of the hardware. It also lets you identify problems in the hardware earlier because you can start verifying if test routines work. To support hardware verification, many ICEs supply short routines to ensure that the target memory is functioning.
As with ROM emulators, ICEs supply emulation memory that enables you to run your application on your target without blowing ROMs. An advantage of an ICE is that you can still use this memory if the target's bus is not completely functioning. The emulation memory is fast enough to emulate target RAM as well.
If you have code actually residing in ROM, ICEs offer another unique capability--the ability to set breakpoints in ROM-based code even if the processor lacks special debug registers to support this.
True ICEs offer many powerful features for real-time integration. The most important of these is real-time trace. Trace allows you to study the sequence of execution before or after a particular event. It is especially useful for resolving system crashes as you can study events leading up to the crash.
The usefulness of a trace system depends on the following features:

Ability to view the trace without stopping the processor
Depth of trace buffer
Triggering and trace qualification mechanism
Trace processing algorithms
The ability to view the trace buffer without stopping the processor is critical in applications such as engine control where the processor cannot be stopped without risking damage to the target system.
Since the cause of a crash or a system malfunction may happen a long time before the failure itself, the depth of the trace buffer and the trace qualification system are very important to help track down obscure problems. A large trace buffer is more likely than a small one to show the original cause of a crash. However, being able to discriminate what information is being captured in the trace buffer allows you to track down problems more quickly. For example, if a memory location is being corrupted, you may want to look only at writes to that particular memory location. It is important to study the event machine, which determines when trace acquisition is triggered. Many emulators offer 4 by 4 event machines that allow the capture of execution activity around events that only occur occasionally. This is helpful when trying to pinpoint the cause of intermittent failures.
Trace processing is an issue to consider when comparing real-time trace. If your application is written in C/C++, it is preferable to be able to see source code lines in the trace so you can easily relate it to your original program. Another feature that makes execution trace easier to read is a dequeuing algorithm. A dequeuing algorithm determines which instructions were prefetched from memory but never executed and then removes them. As a result, the trace accurately reflects what was executed rather than what instructions were fetched from memory. Some trace-processing algorithms are now sophisticated